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ABSTRACT 

As members of the instrument team for the Advanced CCD Imaging Spectrometer (ACIS) on NASA's 
Chandra X-ray Observatory and as Chandra General Observers, we have developed a wide variety of data 
analysis methods that we believe are useful to the Chandra community, and have constructed a significant 
body of publicly-available software (the ACIS Extract package) addressing important ACIS data and science 
analysis tasks. This paper seeks to describe these data analysis methods for two purposes: to document the 
data analysis work performed in our own science projects, and to help other ACIS observers judge whether 
these methods may be useful in their own projects (regardless of what tools and procedures they choose to 
implement those methods). 

The ACIS data analysis recommendations we offer here address much of the workfiow in a typical ACIS 
project, including data preparation, point source detection via both wavelet decomposition and image recon- 
struction, masking point sources, identification of diffuse structures, event extraction for both point and diffuse 
sources, merging extractions from multiple observations, nonparametric broad-band photometry, analysis of 
low-count spectra, and automation of these tasks. Many of the innovations presented here arise from several, 
often interwoven, complications that are found in many Chandra projects: large numbers of point sources (hun- 
dreds to several thousand), faint point sources, misaligned multiple observations of an astronomical field, point 
source crowding, and scientifically relevant diffuse emission. 

Subject headings: methods: data analysis; methods: statistical; techniques: image processing; X-rays: general 



INTRODUCTION 



Since its launch in 1999, the Chandra X-ray Observatory (Weisskopf et al. 2002) has revolutionized X-ray 



astronomy. Chandra provides remarkable angular resolution — unlikely to be matched by another X-ray observatory 
within the next two decades — and its most commonly used instrument, the Advanced CCD Imaging Spectrometer 
(ACIS), produces observations with a very low background ( Garmire et al.|[2003[ )p] These two technical capabilities 
allow detection of point sources with as few as ~5 observed X-ray photons (commonly referred to as "events" or 
"counts"), a data analysis regime unique among X-ray observatories. Observations of Galactic star clusters and 
mosaics of nearby galaxies or extragalactic deep fields often produce hundreds to thousands of weak X-ray sources. 
Chandra^s excellent sensitivity to point sources and angular resolution also provide a unique capability for studying 
diffuse emission superposed onto those point sources, since they can be effectively identified and then masked. 

For many types of ACIS "imaging" studiesj^ most observers follow a data analysis workflow that is similar to 
that outlined in Figure [l] Relatively raw data derived from satellite telemetry, known as "Level 1 Data Products"!^] 
(LI), are passed through a variety of repair and cleaning operations to produce "Level 2 Data Products" (L2) that 
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are appropriate for analysis. A common workflow for studying point sources (solid boxes and arrows on left side of 
Figure [I]) consists of binning the L2 event data into one or more images that are searched for sources. The events 
and background associated with each point source in the catalog are "extracted" and calibrated. Observed sources 
properties (e.g., count rates, apparent fluxes, spectra, light curves) are estimated, then combined with calibration 
products to estimate intrinsic astrophysical source properties. A common and similar workflow for studying diffuse 
emission (right side of Figure [T]) consists of removing ("masking") the point sources from the data, constructing 
images, identifying several regions of diffuse emission to study, and then extracting and analyzing those diffuse 
sources in a manner similar to that used for point sources. 
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Fig. 1. — The data analysis workflow described here for a single ACIS field containing multiple point sources (PS, 
left branch) and diffuse sources (DS, right branch). 



Many Chandra studies exhibit one or more of five characteristics that significantly complicate this familiar 
workflow. 



1. In many studies hundreds to thousands of point sources can be readily identified; executing the workflow is 
intractable without significant automation. 

2. Numerous sources with very few detected counts are identified in most Chandra studies; common statistical 
methods based on large-N assumptions break down at several points in the workflow. 

3. Many studies require covering a large field of view with multiple Chandra pointings (e.g.. Figure [2|. For many 
reasons, such mosaicked pointings usually overlap significantly and/or are observed at a variety of roll angles. 
Thus, many sources are observed multiple times at very different locations on the ACIS detector. Analysis of such 
sources can be very complex, because the Chandra point spread function (PSF) exhibits large variations in size 
and shape across the focal planej^ Analysis of diffuse emission is also complicated by multiple pointings, since a 
single diffuse region may be only partially covered by a particular ACIS observation. 



See Figure 8 and Figure 10 in the 



HRMA User's Guide 
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4. Some fields (such as deep exposures of ricfi star clusters, the Galactic Center, or nearby galaxies) are crowded, 
with adjacent PSFs in close proximity. In such conditions, simple approaches to source extraction and background 
estimation are not adequate. 

5. Scientifically relevant diffuse emission must often be extracted and studied while avoiding contamination from 
point sources (e.g., in star forming regions, the Galactic Center, and galaxy clusters in deep fields). 
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Fig. 2. — Exposure map for the Chandra Carina Complex Project ( Townsley et al.|2010 | study of the Carina Nebula 
comprised of 22 ACIS-I pointings (38 observations), with point sources masked ( ^5.4[ ) in preparation for extraction 
of diffuse emission (S|9|. All point sources are masked in the individual, high-resolution, exposure maps; at this scale 
only the larger masks (on mostly off-axis sources) are visible. 



Since our Chandra studies often exhibit all five of these issues, we have developed a set of data analysis methods 
that incorporate a variety of enhancements to standard techniques. Publicly available software implementing these 
methods — the ACIS Extract package — has been cited in publications for at least 50 Chandra targets. 

This paper seeks to describe these methods at a moderate level of detail. Our primary purpose is to encourage 
other ACIS observers to consider whether the various departures from standard techniques described here may 
be useful in their own projects (regardless of what tools and procedures those observers use to implement those 
methods). Our secondary purpose is to document the data analysis methods used in our own Chandra studies. In 
many cases, the only documentation available for software and data analysis methods is on-line; thus this paper is 



liberally footnoted with relevant URLs. We acknowledge that URLs are more ephemeral than journal citations but 
we believe that they are better than no documentation at all. 

The high-level structure of our data analysis workflow differs from standard practice in two ways. First, for 
technical reasons, the optimal data cleaning steps for point sources and diffuse sources differ for ACIS data (Sj3|; thus 
Figure [l] shows separate "cleaning" operations on those two branches of the analysis. Second, since the point source 
extraction process generates estimates of source position (^7.1) and source validity (i |4.3[ ) that are expected to be 
better than the estimates made by typical source detection procedures, we choose to adopt an iterative workflow (j |4.1| 
and dashed boxes and lines in Figure [T]) in which the point source detection process merely nominates candidates 
that are then repositioned and possibly discarded as likely noise peaks after extraction results are in hand. 

Most of this paper describes the individual data analysis tasks implied by Figure [l] emphasizing the changes 
to standard methods that we have adopted. We assume the reader is familiar with ACIS data and with standard 
analysis methods, both of which are well described by the Chandra Science Threads ' Throughout the text, we make 



liberal use of footnotes that direct the reader to on-line documentation, much of it provided by the Chandra X-ray 
Center (CXC), that is useful for understanding ACIS data analysis and issues. The process of preparing L2 data 
products is discussed in f|3j Point source detection is reviewed in ^|4j Extraction of point sources and background 
estimation are presented in fj5] Section [6] describes our approach to handling multiple observations of a source. 
Estimation of observed and intrinsic source properties is discussed in S|7j In S|9] we describe modifications of point 
source methods for use on diffuse sources. 

It is critical for the reader to recognize that the methods described here are not unique, definitive, or optimal for 
all purposes. Alternative approaches to ACIS data analysis are provided by the Chandra Source Catalog developed 
by the CXC, and by analysis tools such as XAssis^ ( Ptak fc Griffiths|2003 1 and yaxap] ( |A"ldcroft|20"06 ) developed by 
other researchers. 



2. The ACIS Extract Package 



Although the focus of this paper is to discuss data analysis techniques, rather than implementation of those 
techniques, in fact most of the software and recipes that we use for ACIS data analysis are publicly available in a 
package called ACIS Extrac^{AE), which was first released to the community in 2002. This paper will refer to AE 
when discussing any data analysis method that is implemented in A CIS Extract. We do not attempt to describe here 
either the full capabilities of AE or how to use the software, but instead refer the reader to the extensive AE manual 
and recipes. AE can also be applied to X-ray data from the EPIC instrument on the XMM-Newton observatory, 
with reduced capabilities and documentation. 

AE significantly automates the extraction and analysis of both point-like and diffuse sources; our largest project 
to-date ( [Townsley et al.|[2010 ) involved 14,000 point sources and complex diffuse emission in a mosaic of 38 separate 
ACIS observations, shown in Figure [2] AE relies heavily on tools in the Chandra Interactive Analysis of Observa 
tion^^ ( CIA O) package ( Fruscione et al.|2006 1. Other software packages employed by ^i? include the IDL Astronomy 
"User's Lihrar§^ ( |Landsman||1993| ), MAR] ^SAOImage DS9^ \FTOOL^ ( |Blackburn||1995[ ) , XSPE(^ ( jArnaud 
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Several dozen studies by several independent groups (including at least nine Large and Very Large projects) 
have employed AE. Examples include: 



extragalactic survey fields: Chandra Deep Field North ( Alexander et al.||2003 ); Chandra Deep Field South (Luo 



et al. 



2008 Lehmer et al. 



2005); the Serendipitous Extragalactic X-ray Source Identification program (Eckart et 



al][2006| ; SSA22 ( [Lehmer et aL][2009 l 



lensed quasars: 
eral][2007| 



PG 1115+080 (Pooleyet al. 2006), IRXS J1131-1231 ( Blackburne et al. 20061, survey of 10 (Pooley 



galaxy clusters: Coma Cluster ( Hornschemeier et al.|2006 ), Abell 85 and Abell 754 ( Sivakoff et al.|2008a ), various 
( [Fassnacht et al.||2008[ ) 



nearby galaxies and LINERs: IC 10 (IBauer & Brandt||2004p, sample of LINERs (IFlohic et al.' 2006"), SN 2006g 



in NGC 1260 ( [Smith et al.(2007l ), M 33 ( [Plucinsky et"al][2008| ) , Centaurus A ([Sivakoff et aT.||2008b| ), SN 1996cr in 
Circinus ( [Bauer et al.[[2008| ), NGC 6946 and NGC 4485/4490 ( [Fridriksson eral..2008J 



globular clusters: 47 Tucanae ( [Heinke et aI1[2005[ ), Terzan 1 ( [Cackett et "aLl[2006[ ) , Terzan 5 ( [Heinke et aI1[2006[ ), 
NGC 288 ( [Kong et al.[[2006| ), M30=NGC 7099 ( [Lugger et al^[2007| , NGC 6366 and M 55 ( [Bassa et al.[[2008t , Gl 
( [Kong et aL][2009| 



the Galactic Center: point sources ( [Muno et aL][2003[ [2006[ |2009 ) 



young stellar clusters and star formation regions: M 17 ( Townsley et al.[[2003 Broos et al.[|2007|, the Orion 
Nebula ( [Getman et "al|[2005[ ) , L 1448 ( [Tsujimoto et al.[|2005^ , Cep B ( [Getman et al.[[2006[ ), Wd 1 ( [Muno etaT 



2006| , 30 Dor ( [Townsley et"al][2006a|b[ ), W 49A ( [Tsujimoto eral][2006| , NGC 6357 ( [Wang et a l.''2007), RCW 49 



( [Tsujimoto et al.[[2007[ ), IC 1396N ( [Getman et al.|[2007[ ). Coronet cluster ( [Forbrich fc Preibisch||2007| ), Tr 16 
( [Albacete-Colombo e t al.|[2008[), W 3 ( [Feigelson fc Townsley||2008D, CG 12 ( [Getman et"al][2008[ ), the Rosette 
Nebula ( jWang et al. '2008, 20091 [20To| ), NGC 6334 (Feigelson ct al. 2009), Cygnus 0B2 ( [Albacete Colombo et al' 
2007[ [Wright fc Drake||2009j ), and the Carina Nebula ( jTownsley et al.||2dTo| . 



Note that the present paper describes the capabilities of AE as of 2009 November, and that not all features were 
present in earlier studies. 



DATA PREPARATION 



The Chandra Data Archiv^^^j provides cleaned and calibrated X-ray data products, known as '"Level 2 Data" 
(L2). We rebuild L2 data products from the more primitive LI products, also found in the archive, in order to 
apply additional processing steps. Much of our Ll-to-L2 processing ( Townsley et al.[[2003 Appendix B) will not be 
described in detail here because it follows the standard recommendations shown in the ICXC's Science ThreadsIP] 

For example, like many observers we take the precaution of verifying and improving (if possible) the astrometry 
of every observation, even though the absolute astrometry assigned to Chandra observations using the star tracker 
aspect solution is often quite accurate (~0.6", 90% confidence radiu^^ . Astrometric alignment is particularly 
important when multiple observations overlap, so that the single celestial position adopted for a source will produce 
well-positioned extraction apertures in each of its constituent observations. Our procedure for this task is similar 
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to the astrom etry science threacf^ provided by the CXC. We typicaUy aUgn each ACIS observation to a pubUshed 
astrometric catalogj^ rather than to a reference ACIS observation, using prehminary ACIS sources in the inner 
8' X 8' portion of the field identified by the wavdetect tooF^ ( Freeman et al. 2002 ) . Our current catalog matching 



procedure derives only offset corrections (no roll correction), which are applied to the observation's aspect file and 
event data using the CI AO tools wcs-update and reproject-cvents. In favorable circumstances with many matching 
sources, Chandra fields can be aligned to the reference astrometric frame to better than 0.1" precision. 



Other standard Ll-to-L2 processing steps are applied. We remove events whose "grades ' P^ are in the standard 
set of "bad" grades; we remove events arriving during time intervals designated as "bad" ; we remove events arriving 
during periods of very high instrumental backgrouncj^ due to solar activity; we construct an exposure map for 
each observation. In the early years of the Chandra mission we developed and implemented a technique to mitigate 



the effects of charge transfer inefficiency (CTI) in both front- and back-illuminated ACIS CCDs (Townsley et al. 



2000 2002 available in the Physics database of ADS). Since this method has been incorporated into the standard 



data processing by the CXC and its calibration is maintained by them, we now use the CXC's version of the CTI] 
[corrector part of the tool acis-process-cvents. 

A few aspects of our Ll-to-L2 processing warrant discussion: 



Improving event locations Two procedures are used here. First, standard CXC pipeline processing adds a ±0.25" 
^random number to each event's position blurring the excellent PSF at the center of the field, as discussed in 
presentations at the Chandra Calibration Workshop by Marshall p°] Pease (page 8)p^ and Smith (page 3)p] We 
disable this randomization when the data are reprocessed by the CI AO tool acis-process-events. 

Second, the positions of events with non-zero event grades (i.e., where some charge appears in neighboring pixels) 



can be somewhat improved by "sub-pixel resolution" algorithms described by Tsunemi et al. (20011; Mori et al. 
(2001 ) and Li et al. (2004)^1 Either of these two procedures will tighten the PSF in the inner portion of the field; 



the degree to which the PSF is improved depends upon the fraction of multi-pixel events, which in turn depends 
on the spectrum of the source and whether the source is imaged on a front-illuminated or a back-illuminated CCD. 



Bad Pixel List For most studies we use and recommend a less-aggressive Bad Pixel Lisl P than the one produced by 
CI AO. The ACIS pixels in the default list that we choose to reviv were originally deprecated because they have 
[elevated background at very low energie^^ (usually < 0.5 keV, occasionally up to 1 keV) that can cause problems 
for the analysis of very soft diffuse structures. Because the instrumental background increases sharply below 
0.5 keV on the front-illuminated CCDs that make up ACIS-I, event energies < 0.5 keV are customarily ignored 
anyway. We prefer to accept the small residual increased background above 0.5 keV than to lose observatory 
effective area. Our custom Bad Pixel List recovers > 4% of the columns in the ACIS-I array. A somewhat larger 
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2MASS Point Source Catal oj''''^| jSkrutskie et al. |2006| , which are on the accurate Hipparcos reference frame. 
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Both the Tsunemi (http://wwwxray.ess.sci.osaka-u.ac.jp/-mori/chandra/index.html) and Li (http://www.cis.rit.edu/ 
[peo ple/f aculty/kastner/SER/ser .html ) groups have released tools implementing their sub-pixel resolution algorithms. 
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A new observation-specific Bad Pixel Table is constructed by re-running the CI AO tool acis^run^hotpix with its badpixfile parameter 
poi nting to an edited list of permanent bad pixels in place of the default list. 
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improvement in effective area (averaged over ACIS-I) is achieved because standard processing of the Bad Pixel 
List discards events detected in the two cohimns adjacent to a bad column as well as the bad column itself. 



Bifurcated Workflow One component of the ACIS instrumental background is a cosmic ray artifact known as 
I "afterglow, 'P^ operationally defined as a group of events appearing at nearly the same location on the detector 
in nearly consecutive CCD frames. Eliminating afterglow events is essential to any project that seeks to detect 
weak point sources, since a group of afterglow events can easily be mistaken as a weak point source. Prior to 2004 
the CXC used a tool named acis^detecLafterglow to identify afterglow events; this aggressive tool is quite effective 
(few false negatives), but suffers from many false positives — events from even moderately bright sources that are 
mistakenly identified as afterglows. In 2004 the CXC adopted an alternative tool named acis-runJiotpix] this 
gentle tool effectively controls false positives, but misses most afterglow series containing fewer than 10 events. 
In many ACIS observations, even these short afterglow series will be interpreted as statistically significant point 
source detections. 

Two additional standard techniques for reducing the ACIS instrumental background suffer the same problem of 
false positives near bright sources. The first involves an event grading technique]^ (implemented by the CI AO 



tool acis-process-events) that is available for data taken in Very Faint Mode. The second involves removing a type 



of artifact found on the ACIS-S4 CCE (implemented by the CI AO tool destreak). 

The lowest instrumental background level can be obtained by applying these two cleaning procedures and the 
aggressive afterglow detection procedure. This aggressive cleaning scheme is preferred when searching for weak 
point sources and when studying diffuse emission, but is not appropriate when extracting bright sources because 
the number, spatial distribution, and spectral distribution of events mistakenly removed by these algorithms (false 
positives) near bright sources is not well known. 

Thus, we recommend the bifurcated data reduction workflow shown in Figure [l] An aggressively cleaned L2 event 
list is used to detect point sources and to extract diffuse sources, whereas point sources are extracted from an 
L2 event list that has been only mildly cleaned. This strategy is not ideal because point source extractions will 
occasionally contain some detectable afterglow events contaminating the actual X-ray events that produced the 
detection. We mitigate this problem by checking each extracted source for residual afterglow events and flagging 
sources dominated by afterglows. 

Appendix shows one method for applying both afterglow detection tools and for then performing cither the 
gentle or aggressive cleaning criteria discussed above. Recipes and software implementing this Ll-to-L2 processing 
are available upon request. 



4. POINT SOURCE DETECTION 

4.1. An Iterative Source Detection Strategy 

Typically, the process of source detection is a distinct precursor to source extraction and analysis. Source detec- 
tion algorithms must estimate some sort of significance for a putative source's signal with respect to the background 
expected to contaminate that signal. In other words, source detection algorithms must perform some type of extrac- 
tion of a proposed source, estimate the local background, and compute some type of significance statistic from those 
two quantities. 

Much of this paper describes the algorithms we have developed to perform those same three tasks — extraction 
(^j5|, background estimation (J 5.4 ), and characterizing source significance (Q — on complex ACIS campaigns involving 



multiple misaligned observations of crowded fields of point sources. We expect these careful algorithms to produce 
higher quality results for point sources than can possibly be produced by typical source detection schemes that 
operate on binned images, do not have knowledge of the Chandra PSF, and are not designed for multiple misaligned 
observations. 
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Thus, we have adopted and we recommend a source detection strategy that combines the convenience and 
efficiency of traditional source detection tools with the accuracy achieved only by detailed extraction and analysis 
of an entire catalog. First, a liberal catalog of candidate point sources is obtained from a variety of traditional 
source detection methods, run with aggressive thresholds that seek to find weak sources but are expected to produce 
significant numbers of spurious detections. We extract and characterize the significance of the candidates, and then 



prune those found not to be significant. Nandra et al. (2005) independently developed a similar strategy. 



The trimmed catalog is then extracted again, re-pruned, and so on until all the remaining sources are deemed 
significant. This iterative pruning procedure, depicted in Figure [T] as a loop through the steps "PS catalog," "PS 
extraction," and "prune/reposition sources," is necessary because construction of both source apertures (|5.1[) and 



background regions (^5.41 in crowded fields can be more appropriately performed when the existence of neighboring 
sources is taken into account. In contrast, typical detection algorithms estimate background for a putative source 
using the nearby pixels in an image, without explicit regard for possible contamination by nearby sources. Typically, 
only one iteration is needed for situations where source crowding is unimportant. 

We test the significance of candidate sources in three energy bands (0.5-2 keV, 2-7 keV, and 0.5-7 keV) and 
adopt the highest value. We prefer the upper limit of 7 keV instead of the more typical value of 8 keV because there 
is a line in the instrumental background spectrum at 7.47 keV. When a source has been observed multiple times, we 
adopt the highest significance among all combinations of observations (j|6.2|. 



4.2. Establishing the Candidate Source List 



We construct the list of candidate sources using a combination of several methods. Many sources can be identified 
by a wavelet-based detection algorithm for Poisson images that has been widely used on Chandra data — the CI AO 
tool wavdetec^^ ( Freeman et al.|[2002 | — run with the liberal threshold of 1 x lO^"^ rather than the commonly used 
level of 1 X 10~^ in order to nominate as many actual sources as possible without nominating an excessive number 
of candidates that will later be found to be insignificant. For a simple field with one pointing, twelve images are 
searched — corresponding to three energy bands (0.5-2 keV, 2-7 keV, and 0.5-7 keV) and four image pixel sizes (0.25", 
0.5", 1", and 2") to sample appropriately the variable Chandra PSF. All the wavdetect catalogs are merged — pairs of 
catalogs are matched using the algorithm described in ^ and the most accurate position from each matching pair 
is retained. The resulting catalog is visually examined to remove any obvious duplicates remaining. 

Wavelet decomposition of images is often not effective in resolving closely spaced sources, as the method is 
designed to find structures on a range of spatial scales. For such situations, image reconstruction algorithms that 
remove the blurring effects of the point spread function can significantly improve source identification. We search 
for faint and crowded sources by locating peaks in reconstructed images obtained with the Lucy-Richardson algo- 
rithm (Lucy 1974), implemented in the IDL Astronomy User's Library\^ Similar image reconstructions have been 
invaluable for achieving effective spatial resolution as good as ~0.3" FWHM in observations of Chandra targets, for 
example a jet in the Chandra first-light target (Chartas et al. 20001) and SN 1987A (Burrows et al. 20001. As the 
PSF varies strongly across the Chandra field due to the telescope optics, the image reconstruction is performed on 
many small overlapping tiles (Figure [s]) using local PSFs. The candidate sources identified in these tiles are merged 
with those obtained with the wavdetect algorithm. 

Figure|4]shows an example of the effectiveness of image reconstruction in regions with crowded, faint sources. The 
wavdetect procedures located 50 sources in this sub-image of the center of the young stellar cluster Trumpler 14; an 
additional 50 are found as peaks in the reconstructed images ( |Townsley||2006| [Townsley et al.||2010[ ). The reliability 
of most of these sources is confirmed; 89 of the 100 reconstruction sources coincide with stars detected in deep, 
high-resolution near-infrared exposures (Thomas Preibisch, private communication; Ascenso et al.|[2007 ), some with 
separations <1". Of the 11 unconfirmed X-ray sources, 9 appear in close pairs where the other member of the pair 
is confirmed by an IR counterpart. Testing the validity of these X-ray sources will require very high- resolution, 
sensitive IR data, as it may be difficult to find faint, close companions in IR observations of such crowded regions 
suffused by diffuse IR emission. Since the X-ray fiux of a young star does not correlate closely with its IR brightness. 



http://asc.harvsLrd.edu/ciao/doM nload/doc/detect_manual 



41, 



http : //idlastro . gsf c . nasa . gov/ 
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TO 



Fig. 3. — Many 1.5' x 1.5' image reconstruction tiles covering an ACIS-I pointing. Tiles nominally overlap by 
1/4 along both axes. Individual tiles can be seen at the field edges; a single tile at the field center is highlighted. 
Each tile image is reconstructed using a local Chandra-AClS PSF; peaks in the reconstruction produce point source 
candidates. 

it is not unreasonable for us to find close X-ray pairs that are not seen in IR images, and vice versa. 

When a project's scientific goals include the search for faint X-ray emission from a previously- defined catalog of 
interesting objects found in other wavebands, the candidate X-ray source catalog can be supplemented with source 
positions that are not derived from the ACIS image. The AE procedures can then judge whether significant emission 
is present from each single source, and the positions can be 'stacked' for high-sensitivity examination of the collective 



properties of the user-provided catalog (^7.71 



Finally, we often supplement the candidate source catalog with the coordinates of suspected point sources 
identified from visual review (e.g., with the DS9 visualization program) of the X-ray data. This visual review also 
allows removal of obviously spurious candidate sources that occasionally emerge (e.g., from detector artifacts such 
as bright source readout-streaks) and duplicate sources. 



4.3. Thresholding a Source Significance Statistic 

In typical source detection schemes, the existence of a candidate source is evaluated using the signal-to-noise ratio 
(SNR), which is a photometry value divided by its uncertainty. When photometry values have Gaussian distributions, 
then a SNR threshold directly corresponds to a level of significance in a statistical test of the null hypothesis that 
there is no source signal present, only background. 

However, most ACIS sources are quite weak with very few counts extracted from the source aperture, and 
in crowded fields sometimes very few counts available to estimate the background. Gaussian approximations to 
photometric confidence intervals in such cases can be quite poor. We prefer to test directly the null hypothesis that 



a candidate source does not exist using the method described by Weisskopf et al. (2007 Appendix A2) based on the 
Poisson distribution. Computation of this significance statistic, Pb, is described in Appendix [B] 

The observer must set, either a priori or after careful examination of the image and possible multi- wavelength 
counterparts, a threshold value of Pb that strikes a reasonable balance between the competing goals of low false 
detection rates and high sensitivity. Analytical methods to estimate false detection rates and sensitivity have been 



developed (e.g., by Nandra et al. 2005 Georgakakis et al. 20081, however extending these methods to studies that 
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Fig. 4. — The central 100 ACIS X-ray sources (diamonds) identified in the crowded core of tlie young star cluster 
Trumpler 14 ( Townsle"y||2006 Townsley et al.|[2010[ ) by combining wavdetect sources (50 green ellipses) and peaks in 
a reconstructed image (^4.2 1. The underlying image shows the ACIS data binned at 0.5" per pixel; the coordinate 



axes are J2000 Right Ascension and Declination. X-ray source extraction apertures constructed by AE (^5.1) are 
not shown. The 86 sources confirmed by near-infrared observations using NTT/SOFI and VLT/NACO ( [Ascenso 
eral]|2007 ) are shown in red; the median offset between X-ray and IR positions is 0.14". An additional 3 sources 
(magenta) are confirmed by new VLT/HAWK-I observations (Thomas Preibisch, private communication). The 11 
sources currently not identified in other wavebands are shown in blue. 



involve crowded source apertures (|5Jj) and multiple overlapping pointings {{ 6.2 ) is not straightforward. Monte Carlo 
simulation of the detection process (e.g., by Cappelluti et al.|2007 Kim et aL]|2007 ) is a powerful technique to study 
false detection rates and sensitivity, however simulation of our complex detection process would require unreasonable 
quantities of both human and computing resources. Thus, we do not currently have an objective or authoritative way 
to set detection thresholds. As with most source detection tasks, the scientist's subjective judgment is critical for 
setting criteria for source existence, and high-quality observations in other wavebands are invaluable for informing 
that judgment. 
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5. POINT SOURCE EXTRACTION 



The process of "extracting" a putative point source consists of several tasks: defining an appropriate aperture 
around the source position, defining an appropriate background region that is expected to be a good estimator for the 
background contaminating the source aperture, cohecting the observed events within the aperture and background 
regions, and constructing a model for the response of the observatory that correctly calibrates the extraction so that 
intrinsic source properties can be derived. 

When the source is assumed to be point-like, all of these tasks are best performed by algorithms that consider 
the shape of the local Chandra- ACIS point spread function (PSF). Thus, AE begins the extraction process by 
constructing a model of the local PSF (Appendix [C| . 



5.1. Construction of Extraction Aperture and Mask Region 



Many Chandra observers obtain extraction apertures for their point sources either from the CI AO tool wavde-\ 
|^c{^ ( Freeman et al.|[2002 ) (the most popular source detection tool in the Chandra community), or by drawing the 
apertures by eye]^"^! Because wavdetect does not have explicit knowledge of the Chandra PSF, and because it seeks to 
detect both point-like and extended sources, it sometimes produces aperture shapes that are radically different from 
the PSF (e.g., ellipses with extreme eccentricity). An additional concern with apertures from wavdetect or visual in- 
spection is the risk of introducing an upward bias into photometry because both tend to "follow the light" — including 
only regions of the image where, by chance, the source produced an excess of events over the background. In our 
opinion, a region derived from the local PSF is the most objective and appropriate extraction aperture for point 
sources. 

When a source is "extended" (resolved by Chandra) the spatial distribution of its observed events do not follow 
the local PSF, and application of the point source extraction techniques in this section will underestimate the 
flux of the source. Such sources are more properly designated as "diffuse" and should be extracted by procedures 
that do not make the point-like assumption (see No automatic procedure for defining the extraction aperture of 
such a source seems feasible — one must either define an aperture that follows the observed light (as wavdetect does) , 
or apply a priori information about the structure of the source to define a suitable aperture. Observers often struggle 



to decide if a source is extended. The Chandra Source Catalog provides a sophisticated analysis of source extent ^ 
and the CXC has recently introduced a [science threac ^ for measuring source extent. ^iJdoes not currently address 
this task. 



Several observers employ elliptical approximations to the PSF as extraction apertures, e.g., Nandra et al. (2005) 



and the Chandra Source Catalog^ use ellipses that enclose 70% and 90% of the PSF power respectively. AE 
extraction apertures are built from contours of the local PSF at 1.5 keV, as shown in Figures [5]and[6| By defauh, 
AE apertures enclose ~90% of the PSF power, which we have found is a reasonable trade-off between maximizing the 
source's signal and minimizing the background's signal in typical fields. AE does not currently attempt to optimize 
the aperture size based on the source and background levels (e.g., use a large aperture for bright sources and a small 
one for sources just above the background). 

For many C/ia77.rfro observations, some of these default apertures will overlap significantly due to source crowding. 
AE iteratively reduces the aperture sizes of crowded sources until the apertures no longer overlap (Figure [6]). The 
brighter member of a pair of crowded sources maintains its default aperture until the aperture for the weaker member 
has been driven down to a minimum allowed size (enclosing ~40% of the PSF power). Of course, some light from 



http : //asc . harvard . edu/ciao/doMnload/doc/detect_manual 



See for example the analysis threads 



( http : 



"Using psextract to Extract ACIS Spectra and Response Files for Pointlike Sources' 
//asc. harvard. edu/clao/threads/ps extract/^ and |"Step-by-Step Guide to Creating ACIS Spectra for Pointlike Sources" | ( |http: //asc. 
harvard. edu/ ciao/threads/pieces/ ). 



http: //cxc .harvEurd. edu/csc/columns/srcextent .ht ml 



http : //cxc . harvEird.edu/ciao/threads/srcextent/ 



The Chandra Source Catalog performs two extractions for each source 
using a wavdetect ellipse and one using a PSF ellipse. 



I http: //asc .harvard. edu/csc/colunms/f luxes .html 
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Fig. 5. — Example of extraction apertures (contours of the local PSF) enclosing 90% of the PSF power constructed 
by AE {{5.1) for a source observed at three off-axis angles — 3.7' (upper-left), 4.1' (upper-right), and 9.1' (lower-left). 
The shape and overall size of the Chandra PSF varies significantly across the field of view. Combining all of the 
available extractions of a source (lower-right) will in some cases produce lower-quality estimates of source properties 
than could be obtained by ignoring some observations (e.g., the far off-axis data in the lower-left panel). Techniques 



for deciding which observations to ignore arc discussed in { 6.2 



the wings of nearby sources may contaminate even a reduced source aperture; this light constitutes an additional 
background component for the source. AE provides a sophisticated background algorithm that models and subtracts 



this component (^5.4) 



In a particular observation, a source may be so highly crowded that assigning minimal apertures to it and its 
neighbor cannot prevent overlap. When multiple observations of the source are available, such overlapping extractions 



are discarded by AE (^6.2[) in recognition that there are limits to our ability to estimate backgrounds (^5.4) under 



conditions of extreme crowding. If all extractions of a pair of candidate sources are overlapping, then our policy is to 
eliminate the weaker candidate in recognition that either the detection process has mistakenly split a single physical 
source into two candidates, or that two physical sources are not resolved by our observation (i.e., meaningful source 
properties cannot be determined for each source). 




Fig. 6. — Example of crowded extraction apertures (contours of the asymmetric local PSF at ~ 8' off-axis) that have 
been reduced in size (enclosing 56% and 44% of the PSF power) until they do not overlap. These two X-ray source 
positions (plusses) are confirmed by a high-resolution infrared observation. 



A PSF model and an extraction region must be constructed independently for each observation of a source. A 
pair of sources may be well-separated with nominally-sized apertures in an on-axis observation, but may suffer severe 
crowding with reduced apertures in an off-axis observation. Even a source observed twice at similar off-axis angles 
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may have apertures with very different shapes, since the azimuthal shape of the Chandra PSF varies across the focal 
plane 



5.2. Event Extraction and Calibration Data Products 

Once apertures are determined for each observation of a source, a mostly routine extraction process is followed 
for each observation using standard CI AO tools — dmextract selects the events within the aperture and constructs 
a Type I HEASARC/OGIP-compatibl^P| "source" spectrum; mkarf queries the Chandra Calibration Database and 
constructs an ancillary reference file (ARF); mkacisrmf qneries the Chandra Calibration Database and constructs a 
response matrix file (RMF). When an aperture dithers across multiple CCDs, standard extraction method^^ will 
over-estimate the source flux (by as much as 50%) because the response at the source position of only one CCD is 
computed (using a single call to mkarf). AE avoids this mis-calibration of the extraction by computing and summing 
the response at the source position of all the CCDs (using multiple calls to mkarf). 



5.3. Aperture Correction 

Perhaps the most important calibration facilitated by the PSF model is accounting for the point source light 
falling outside the aperture and not extracted — a standard part of optical and infrared data analysis commonly 
referred to as "aperture correction." Such correction is necessary because the Chandra Calibration Database effective 
area data assume an infinitely large detector and extraction aperturej^ 

Since the size of the Chandra PSF varies significantly with photon energyp] the aperture correction is a function 
of energy. AE builds images of the local PSF at five monochromatic energies (Appendix [C]) , and calculates for each 
PSF image the fraction of the power that falls within the aperture. Those five "PSF fractions" are interpolated to 
estimate a PSF fraction at every energy. The product of this PSF fraction curve and the nominal ARF produced 
by CI AO represents the true observatory effective area that is supplying photons within the aperture; this corrected 
ARF is carried forward in the analysis. The energy and off-axis dependence of this aperture correction is illustrated 
in Figure [7j 

Omitting an aperture correction has two scientific effects. Firstly, flux estimates are biased downward (by ~10% 
for the examples in Figure [7] for a typical observed spectrum peaking near 1.5 keV). Secondly, the inferred shape 
of the source spectrum is somewhat distorted, especially for off-axis sources, because the PSF fraction varies with 
energy (Figure [7j lower panel). 



5.4. Background Extraction 

Assessment of the detection significance and the spectral properties of a weak source depend critically on 
estimation of unbiased background spectra for each extraction (each observation) of the source. Often, background 
estimates must be performed "locally" for each source to account for spatial variations due to diffuse emission, or 
wings of nearby point sources. AE supports three methods of constructing local background spectra. 

1. When all sources are well separated, a traditional approach using annular background regions is adequate. AE 
implements this approach by first masking (removing) virtually all the point source light from the data set using 



See Figure 10 in the 



HRMA User's Guide 



I http : //cxc ■ harvard . edu/cal/Hr ma/user s_guide/ I 



http : //heasarc . gsf c . nasa . gov/docs /xanadu/xspec/f Its/fit sf lies .html 



See for example the analysis threads 



'Using psextract to Extract ACIS Spectra and Response Files for Pointlike Sources" 



I http : 



//asc. harvard. edu/ciao/threads/ps extract/^ and |"Step-by-Step Guide to Creating ACIS Spectra for Pointlike Sources" | ( |http://asc. 



harvard. edu/ ciao/threads/pieces/ ). 



For example, the ahelp page for mkarf says "The ARF is computed assuming that the spectral extraction region is large enough to 
include the entire PSF (e.g., P SF fraction=1.0)." 



^See Figures 5 and 6 in the 



HRMA User's Guide 



( http : / /cxc . harvard . edu/ cal/Hrma/users_guide/ I 
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Fig. 7. — Effective area as a function of X-ray energy (upper panel) for typical on-axis (black) and off-axis (gray) 
sources extracted by AE using default extraction apertures (90% PSF fraction at 1.5 keV, ^5.1[ ). Each aperture- 
corrected (dashed) curve is the product of the nominal curve (solid, from mkarf) and a PSF fraction curve (lower 
panel) obtained by interpolating measurements at five energies (dots). 

circular mask regions with a nominal radius that is 1.1 times a radius that encloses 99% of the PSF. Circular 
background regions are then defined independently for each source to encompass a number of background events 
specified by the observer. 

2. AE also provides an alternative masking algorithm that first models the surface brightness expected from the 
point sources (using the PSF models and rough estimates of source fluxes), and then iteratively seeks to mask 
(remove) all portions of the field where the ratio of that point source glow to the local background level exceeds a 
threshold. This algorithm allows the large wings of bright sources to be removed with large masks, while avoiding 
excessive loss of background area around weak sources, as shown in Figure [2] Circular background regions are 
then defined independently for each source to encompass a number of background events specified by the observer. 

3. When sources are crowded, the concept of creating background spectra from a data set cleaned of all point source 
light is not appropriate. By definition, a significant component of the background for a crowded source arises from 
the wings of its neighbors; those wings must be modeled. 

The ultimate analysis strategy would conceptually separate a source's background into a point source wing com- 
ponent and a traditional smooth component not associated with detected point sources. The latter would be 
estimated via the masking techniques described above, and the former would be modeled via a complex multi- 
source spatio-spectral modeling process. Lacking the resources to implement that strategy, we have implemented a 
less complex strategy that seeks to estimate both components within each source's aperture by carefully sampling 
the observed event data. 

AE first constructs a spatial model for every source (using the PSF models and rough estimates of source fluxes), 
and then iteratively constructs a background region for each source that seeks to sample fairly the light that each 
neighbor's wing is depositing into the source aperture. In the simple case where the source has one neighbor that 
is contributing background to the source aperture, the algorithm will tend to build a background region that is an 
annulus around the neighbor, as shown in FigurejS] As the background region grows during the iteration, it will seek 
to avoid the wings of more distant sources that are not contributing light to the source aperture; thus for uncrowded 
sources this algorithm produces similar background regions as algorithm ^2. AE tries to balance the competing 
goals of acquiring the desired number of background events, fairly representing each background component (wings 
from each neighboring point source, diffuse emission, and instrumental background), and sampling the background 
locally (acquiring background events in a compact region around the target source). 

For either masking-based approach, the observer can manually specify masks to remove portions of the field 
that are not suitable background estimators, such as CCD readout streaks associated with bright sources. For the 
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Fig. 8. — A background region constructed by AE (pixels marked in yellow) for a crowded source (red polygon). The 
region seeks to sample neighboring sources (purple circles) in proportion to their expected contamination of the red 
source's extraction aperture. 



spatial modeling-based approach, the observer can supply a spatial model for background structures that are not 
point sources, such as CCD readout streaks; such structures will be avoided or included in individual background 
regions to the extent that the source aperture is contaminated by them. 

Regardless of how the background region is defined, we believe that the scaling traditionally applied to the back- 
ground spectrum — the ratio of the geometric areas of the source aperture and background region — is not appropriate 
for ACIS data because the ACIS exposure map contains significant small-scale features caused by the dithering of 
CCD edges and bad columns. For example, a source lying in a so-called chip gap may have an effective exposure 
time that is less than half of nominal, while the majority of its background region may have nominal exposure. AE 
accounts for these exposure map features by adopting a background scaling equal to the ratio of the integrals of the 
exposure map over the source aperture and background region. 



6. MERGING EXTRACTIONS FROM MULTIPLE OBSERVATIONS 

Multiple observations of the same or overlapping fields have occurred throughout the Chandra mission; evolving 
constraints on mission operations now commonly require projects to be split into short segments. When the aim 
points and roll angles are nearly identical, it is appropriate to join the segments and treat them as a single observation 

. But when the observations are misaligned, extraction must be performed separately on 
each observationlf^ 



(e.g., Getman et aL]|2005 



6.1. Composite Data Products 

In principle. X-ray properties for a source could be estimated directly from the multiple extractions of the 
source (multiple source spectra, background spectra, ARFs, and RMFs). For example, photometric quantities could 
be estimated by maximizing a likelihood function involving terms for all of the observations (source and background 
extractions). Similarly, a spectral model for the source could be derived by simultaneously fitting all of the extractions. 
However, this approach is impractical for the majority of faint Chandra sources, where a few counts are spread over 
a number of separate observations. 



The CIAO thread Merging Data from Multiple Imaging Observations (http://cxc.harvard.edu/ciao/threads/coinbine/) states: 
"The merged event list should not be used for spectral analysis, since it does not contain sufficient information to generate correct response 
files. The recommended technique for the spectral analysis case is to generate separate PHA, RMF, and ARF files for each observation 
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AE adopts an alternate strategy — multiple extractions of a source are merged into "composite" data products 
(PSF model, source spectrum, background spectrum, ARF, and RMF), which are carried forward for analysis. Source 
spectra and exposure times are summed. Weighted averages of the PSFs, ARFs, and RMFs are computed using IDL 
and the FTOOLS addarf and addrmf. In order to support "grouping" of low-count spectra at later stages of the 
analysis, the composite background spectrum is represented in the same way as the source spectrum, as an integer 
"counts" vector (the sum of the extracted background spectra) with an associated scaling value (to account for the 
different sizes of the source and background regions), rather than as a real- valued "rate" vector with uncertainties 
on each rate valuej^ Extraction data products from each observation are retained so that the observer can analyze 
them separately (or simultaneously) if desired. 

An important constraint on the design of the individual extractions arises from the decision to sum the integer- 
valued background spectra, namely that all the individual background regions must be designed to have similar 
scaling values. This constraint arises because each individual background spectrum must be fairly represented in 
the composite background in order for Poisson (v'^) uncertainty estimates to apply to the composite background. 
This can be understood by considering an extreme example. Imagine a background channel in which observation #1 
produced 25 counts with a scaling of 25 and observation #2 produced 10,000 counts with a scaling of 10,000. The 
scaled composite background rate would be (25/25) -I- (1 x 10^/1 x lO'') = 2.0. However, the Poisson uncertainty 
estimated (by a fitting package) for that background rate would be far too small, since it is based on the notion 
that 10,025 counts were observed. In fact, the 25-count extraction totally dominates the uncertainty on the scaled 

composite background, which actually has an approximate uncertainty of ^ (^2p)^ + (^i^^^)^ — AE includes 
code and workflow recommendations to ensure that all extractions of a source result in similar scaling of their 
background spectra. 



6.2. Discarding Observations 

When multiple observations of a field are available, astronomers commonly choose to disregard observations 
that are judged to be unhelpful to whatever astrophysical measurement is desired. For example, in ground-based 
optical studies, exposures with unusually bad seeing or unusually high background might be ignored. In all ACIS 
studies, observers are encouraged to discard data obtained during periods of very high instrumental backgrouncj^ 
(due to solar activity) |^ In ACIS studies involving multiple misaligned pointings, observers often choose to ignore 
data taken far off-axis (where the Chandra PSF is significantly degradecr^ if on-axis coverage is available (e.g.. 



Plucinsky et al. 20081. ACIS observers also commonly choose to search for sources both within each observation 



separately and within observer-defined combinations of observations (e.g., Muno et al.||2009 ) 



AE tries to formalize this common practice of ignoring unhelpful observations by implementing three data- 
selection strategies that are applied independently to each source. 



1. When the observer is interested in the validity of proposed sources using AEs Pb statistic ([4.3), then we recom- 
mend an AE option that selects whatever subset of the available extractions that optimizes (minimizes) Pb for 
each source. Under this option, an extraction will be included only when the particular signal and background it 
contributes lowers Pb- This option is appropriate when the observer wishes to adopt the scientific policy that a 
source is deemed to exist if it is significant in any observation, or in any combination of observations. For highly 
variable astrophysical sources, such as young stars, this is a reasonable strategy even when multiple observations 



^'^ When a fitting package groups several spectral channels together, if the spectrum is expressed as integer counts then the Poisson 
uncertainty on the group can be estimated accurately from the total number of counts in the group (e.g., via \/7V). In contrast, grouping 
a spectrum expressed as a rate vector with uncertainties requires Gaussian propagation of those uncertainties; many channels in typical 
ACIS spectra have zero or one count, with virtually meaningless uncertainties. 



http : //cxc . harvard . edu/ciao/threads/acisbackground/index . py . html 



For convenience, this is commonly performed in the early stages of data analysis, guided by the damage that the enhanced background 
will do to the sources most susceptible to background (e.g., diffuse sources). A somewhat better strategy would be to choose high- 
background periods to discard on a source-by-source basis, since very bright point sources would benefit more from extra integration time 
than from a reduction to their already insignificant background. AE has not yet adopted this optimum strategy because implementation 
is difficult. 



See Figure 8 in the 



HRMA User's Guide 



( http : //cxc . harvard . edu/ cal/Hrma/users_guide/ I 
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have similar PSFs. The price paid for increased sensitivity to variable sources is an increased false detection rate 
from the additional number of random "trials" that can produce spurious detections. 

2. When the observer is interested in the position of sources, then we recommend an AE option that selects whatever 
subset of the available extractions optimizes (minimizes) the expected position uncertainty (j ]7.1[ ) for each source. 
An extraction much farther off-axis than its peers will be included only when the disadvantage of its larger PSF^^ 
is outweighed by the advantage of averaging over more counts. Since we register each of our observation to an 
absolute astrometric reference frame, all observations of a source are well-aligned and it is appropriate to estimate 
the position using only the best data we have. 

3. When the observer is interested in time-averaged photometric properties (e.g., fluxes, spectra) then selecting the 
extractions to merge becomes more problematic. Anytime extractions are discarded in order to optimize a pho- 
tometric quantity (e.g., SNR) a bias can be introduced into all photometric properties because the discarded 
extraction may have, by chance, lower observed flux than the long-term average. Thus, the observer must bal- 
ance two undesirable outcomes: sources whose photometric accuracy is degraded by including very poor-quality 
extractions, and sources whose photometry suffers the suspicion of bias because some extractions were discarded 
after examining their data. AE offers an option that strikes this balance by discarding extractions only when 
retaining them would drive the SNR of the merged data set significantly below the optimal SNR. The observer 
specifies the minimum acceptable ratio between the SNR achieved by the merge and the optimal SNR achievable 
by discarding more extractions. This strategy tolerates limited deterioration to the SNR in order to avoid photo- 
metric bias arising from data selection. Bright sources will tend to incorporate all observations since background 
is unimportant, whereas weak sources will tend to reject observations that have backgrounds much higher than 
their peers (e.g., those with much larger apertures). 



7. SOURCE PROPERTIES 

After merging the appropriate set of extractions (observations) for each source, AE calculates various photo- 
metric quantities, estimates various source properties, fits astrophysical models to source spectra using XSPEC, and 
constructs light curves. These analysis capabilities are described in the AE manual, but a few warrant discussion 
here. 



7.1. Source Position 

While initial source positions are provided by a source detection process, these may not be optimal. AE provides 
three procedures for improving source positions. One position estimate is constructed by calculating the centroid 
of the extracted events. A second estimate is determined from the peak of the spatial correlation between the 
Chandra-AClS PSF (Appendix [C]) and the events in a neighborhood around the source. A third position estimate 
is obtained from the location of the peak in a maximum- likelihood p]ucy|[T974l ) image reconstruction of the source 



neighborhood. In a multi-observation reduction, the correlation and reconstruction operations are performed using 
a composite (multi-observation) event image and a composite PSF image because a given source may be undetected 
in an individual observation. 

In our experience, all three methods give very similar positions for most on-axis sources; the centroid position 
is simplest to calculate and is often used. The PSF correlation position is best for sources far off-axis {9 > 5'); 
centroid positions are biased estimates of the true locations due to the asymmetry of the off-axis PSFs. For very 
closely-spaced sources with overlapping PSFs (Figure [6| , the best positions are provided by the maximum-likelihood 
reconstructed images. Since the centroid position is calculated using the extracted events, it should be re-calculated 
after a source is repositioned to verify convergence. 

Positional errors are estimated separately along the right ascension and declination axes according to 

CDec — <7model.Decl^ N (2) 
= \A|a+<^Lc (3) 
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where (Tmodei.RA and crmodei,Dec are the standard deviations along the right ascension and dechnation axes of a model 
of the counts falling within the extraction region. That model consists of the PSF projected onto each axis plus 
a flat background component appropriately scaled, both truncated by the source's extraction region; the spatial 
distribution of the observed data is not considered N is the total number of extracted counts. The quantity ar, 
commonly known as the distance root-mean-square error, is generally reported as the "Icr" error, or 68% confidence 
interval, for the source position. 



7.2. Grouped Spectra 

Standard ACIS spectral files divide the energy range covered by the instrument into many hundreds of pulse- 
invariant (PI) energy channels, many of which have very few counts for typical sources. Energy channels are therefore 
frequently grouped. AE offers only one of the several grouping criteria found in standard grouping tools — groups are 
constructed to have similar signal-to-noise ratios — but improves on standard implementations in several important 
ways: 

1. Standard algorithms provide no convenient way for the observer to define precisely the energy range over which 
spectral models will be fit. AE allows the observer to specify the boundaries of the first and last groups. For 
example, if the energy range 0.5-8.0 keV is specified, then the first group will span all the channels below 0.5 keV 
and the last group will span all the channels above 8.0 keV; these terminal groups can be subsequently "ignored" 
within a fitting package. 

2. The criterion AE uses to define a group is a target SNR for the net counts in the group. Standard grouping tools 
do not consider the background spectrum, and thus produce background-subtracted grouped spectra with SNR 
values lower than the target. 

3. Standard algorithms are asymmetric, filling groups from low to high energies. For sources with fewer than several 
hundred counts, groups often begin with a string of empty channels and end on a channel that happens to have an 
observed event. This permits anomalies in the grouped spectra, such as narrow- width groups with overestimated 
fiux adjacent to wide groups with underestimated fiux. These distortions can bias the spectral fitting and infiate 
the best-fit value. AE's grouping algorithm attempts to mitigate this problem by selecting group boundaries 
mid-way in the run of empty channels (if any) that lie between events belonging to adjacent groups. 



7.3. Non-parametric Characterization of Spectral Shape 

X-ray spectra observed with ACIS energy resolution typically show some combination of continuum components, 
line emission, and absorption features from interstellar matter. Accurate characterization requires fitting with non- 



linear astrophysical models (^7.5 1. However, this fitting procedure does not give meaningful spectral parameters 
for faint sources when a wide range of models are statistically consistent with the sparse data set. Non-parametric 
estimators of spectral shape — statistics that describe the shape of the observed spectrum without assuming any 
astrophysical model — are commonly used for weak sources. 

The oldest and most widely used statistic characterizing the shapes of X-ray spectra is the "hardness ratio" 
involving various ratios of soft and hard band counts. These estimators (known as 'Poisson proportions' in the statis- 
tical literature) are mathematically unstable for sources with extreme spectra where the numerator or denominator 
has few or zero counts ( Brown et al.|200T ). Hong et al. (2004) demonstrated that quartiles (three values of a random 



variable that divide a population into four equal groups) of the observed event energies are superior characterizations 
of spectral shape for low-count observations, avoiding the problem of empty or nearly empty pre-defined energy 
bands. Quartile-quartile plots can be mapped to astrophysical parameters for specified simple model families. 

AE provides estimates of the 25% quartile, median (50% quartile), and 75% quartile of the observed event 
energies over a variety of bands. These statistics are background-corrected, meaning that they seek to characterize 



^'^ The Chandra Source Catalog takes a different approach to positional errors, (http://asc.harvard.6du/csc/coliimns/positions. 
|html[ | adopting error ellipses (http://asc.harvard.edu/csc/why/err_ellipse_msc.htiiil) directly from the wavdetect program for single 
observations, and combining ellipses for multiple observations. 
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the observable spectrum of the astrophysical source as if background were not present. AE implements an intuitive 
and straightforward background correction for standard quartiles, based on the observed cumulative distribution of 
the net spectrum, as shown in Figure |9] This method appears to be equivalent to that described by |Hong et al.| 



(2004 Appendix C), which was developed independently. AE does not yet estimate individual confidence intervals 



for these background-corrected quartiles. 




2 4 
Energy (keV) 



Fig. 9. — Example calculation of the background-corrected median energy statistic. The 20 large upward jumps in 
the cumulative distribution of net counts (rising curve) represent the 20 counts observed in the source aperture; the 
100 small downward jumps (not individually visible) represent the 100 counts observed in the background region, 
scaled down to match the source aperture size. The lowest energy (left vertical line) and highest energy (right vertical 
line) at which the 50th percentile (dotted line) is reached are averaged to produce an estimate of the median energy 
of the parent source. Here, the median energy is ^3.6 keV. 



Our own Chandra studies of young stars have chosen to use the median energy statistic, rather than multiple 



quartiles, for deriving intrinsic properties of absorbed thermal plasmas (^7.41. Many sources in these studies are so 



weak that dividing the few observed counts into multiple quartiles is not feasible. 



7.4. Photometry 



The common procedure for obtaining broad band source fluxes and luminosities (given astronomical distances) 
is to fit the spectrum derived above with astrophysical models convolved with the ARFs and RMFs representing the 
observatory response. This path (which we follow in [ 7.5 for strong sources) is less reliable for faint sources with 



insufficient photons to constrain the nonlinear fitting process. We describe here a photometric procedure to estimate 
photon fiuxes. 

AE begins by performing standard aperture photometry, computing "net counts" quantities for various energy 
bands, S{E), by subtracting a scaled background from the counts observed near the source. 



S{E) = C'{E) - {AyA'')C''iE) 



(4) 



C^{E) is the number of counts observed in the source aperture in energy band E. C^{E) is the number of counts 
observed in the background region in energy band E. The source and background "areas". A" and A^ ^ are derived 
by integrating exposure maps rather than computing geometric areas (j ]5.4[ ). Note that these raw "net counts" 
photometric quantities are not adjusted to correct for finite apertures. Instead, an energy-dependent aperture 
correction (i ]5.3[ ) is applied to the calibration of the extraction (the ARF file); thus, any calibrated photometric 
quantities (such as the fiux estimates below) will include the aperture correction. 

Asymmetric confidence intervals for both C" and C'' are estimated using the common analytical approximations 



to confidence intervals of the Poissonian distribution constructed by Gehrels (1986 equations 7 and 12). These 



confidence intervals are propagated through equation|4j using the method described by Lyons (1991 equation 1.31), 
to estimate an asymmetric confidence interval for S{E). This technique is not ideal, and we anticipate adopting the 



Bayesian technique for estimating confidence intervals used by the Chandra Source Catalog which is implemented 



http : //cxc . harvEird . edu/csc/why/ap_vals_errs . html 
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in the CI AO tool aprates. 



These standard "net counts" quantities are used to construct two estimates of incident photon flux onto the 
Chandra telescope, designated here F* and F^, in units of photon cm~^ s~^ (not the usual erg cni^-^ s^^). For the 
first estimate, the net count rate is divided by the observatory response in each of many narrow channels, and these 
nearly-monochromatic photon flux densities are summed over a chosen energy band: 



PphotiE) - 



S{E) 



photK-j EXPOSURE X ARF(£;) 



PphotiEmin < E < E„iax) — ^ Fphoti^) 



(5) 
(6) 



where S{E) is the net observed counts (equation [4]), ARF(i?) is the observatory effective area (i ]5.2p , EXPOSURE is 
the source exposure time, and Emiri and E,nax are the energy range provided by the userj^ Commonly used energy 
bands are Emin = 0.5 keV and E,nax ~ 2 keV for the Chandra "soft band" and Emin — 2 kcV and E„iax ~ 8 keV 
for the Chandra "hard band." 

For the second photon flux estimate, the net count rate in a user-supplied band is divided by the observatory 
response averaged over the band: 



ARF(i;„ 



^phot (-^" 



<E <E„ 



< E < Emax) 



^ ARF(i?) 



S{E) 



EXPOSURE X ARF(£:„ 



<E <E„ 



(7) 
(8) 



The F* estimator can suffer from large Poisson errors because events (either source or background) at energies 
where the ARF is very small have a large effect on the estimator, and is thus not recommended for weak sources. 
For example, an event at 8 keV where the ARF value is tiny makes a much larger contribution to F* than an event 
at 2 keV where the ARF value is large. The F"^ estimator suffers from a systematic bias with respect to the true 
incident photon flux because the effective area averaged over the energy band (equation [?]) is the correct calibration 
of the net counts photometry only for the non-physical case of a flat incident spectrum. 

When a priori information provides constraints on the shape of a source's intrinsic spectrum, these photon flux 
estimates F* and F'^ can be combined with the background-corrected median energy (^7.3), Emedian, and with the 



astronomical distance d to estimate apparent source luminosities in a chosen broad energy band: 



w ATTd^[F* or F*]E, 



median • 



(9) 



Getman et al. (2010) has studied in detail the accuracy and precision of this estimate for both bright and faint 
sources, particularly when absorption from line-of-sight interstellar matter is present. E^^edian is an accurate, though 
nonlinear, predictor of interstellar column density Nh when the intrinsic family of spectral models (e.g., power law, 
thermal plasma) for the source is known ( Feigelson et al.|[2005 Getman et al.|[20T0 Figure 4). Emedian thus enters 
the calculation twice, once to scale the photon flux to the observed luminosity in equation ([o]), and again to scale 
the observed luminosity to the intrinsic luminosity (corrected for absorption; Getman et al.|[2010 Figure 5). 



Simulations (Getman et al. 20101 indicate that these nonparametric estimates of observed luminosities and 



absorption column densities are quite accurate, with only moderate biases and statistical errors. Systematic errors 
are larger for estimates of the absorption-corrected luminosities, and can be extremely large for heavily absorbed 
sources when the chosen spectral band includes the soft X-ray regime. In our work on faint sources, we choose to use 
the more statistically stable F'^ photon flux estimator and prefer absorption-corrected luminosities in the hard band 
(2-8 keV) (which is less vulnerable to errors in absorption correction) over luminosities in the total band (0.5-8 keV). 



A similar method is used in the Chandra Source Catalog where it is called 



|csc/colmnns/f luxes .html I 



"aperture source energy flux" ( http : //asc . harvard . edu/ 
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7.5. Spectral Fitting 



AE relies on the widely- used XSPEC packag(|^ ( Arnaud 19961) for spectral fitting. XSPEC provides a wide 



range of intrinsic spectral models (such as non-thermal power laws and thermal plasmas), absorption by intervening 
material, and convolution with the Chandra/ ACIS instrumental response. Models derived from minimization 
become inaccurate for sources with few counts, and the problem is exacerbated when a significant fraction of the 
extracted events are background that is subtracted. Spectral fitting of faint sources is thus usually pursued using the 
C-statistic applied to unbinncd data, an application of the Likelihood Ratio Test under the assumption that the data 
follow the Poisson distribution ( Cash||T979 ). Background subtraction is not possible for likelihood-based statistical 
procedures; instead, a background model must be included in the spectral model that is fit to the data (source plus 
background) extracted from the source aperture, and that background model should be simultaneously fit to the 
data extracted from the background region. This fitting procedure is depicted in Figure [lO] as a data flow diagram. 



spectrum from 
source region 



extracted instrumental spectrum 



model 
incident 
spectra 



background model 
(e.g. cplinear) 



^ flat response 



spectrum from 
background region 



source model 
(e.g. tbabs*vapec) 








ACIS response 


— ► 



C-statistic 



predicted 
instrumental ( 
spectra V T I 



minimize 
C-statistic 



C-statistic 



extracted instrumental spectrum 



Fig. 10. — Data flow diagram of parametric fitting procedure using the C-statistic with the cplinear background 



spectral model {{ 7.5 1. The left plus sign represents the addition of two predicted instrumental spectra, and the right 
plus sign represents the addition of two C-statistic values corresponding to multiplication of the likelihood of the 
source and background data. 



XSPEC implements such a background model, which was derived by Wachter et al. (19791, for use with the 
C-statisticj^ This model consists of a free parameter representing the background flux in each of the hundreds of 
spectral channels used in an ACIS spectrum. Such a model raises two theoretical concerns. First, in most cases 
where the C-statistic would be used, there are many more free model parameters than background events. Second, 
such a model is completely unconstrained — extremely narrow and quite non-physical features (one channel wide) in 
the incident background spectrum are presumed to be possible. Indeed, observers have found that this algorithm 
sometimes provides very poor "best fits" to the observed spectrum, with incorrect normalizations and non-physical 
spectral parameters]^ 

We found that substituting a more constrained model for the ACIS background stabilized the algorithm and 
lessened the problem of non-physical fits. For sources with relatively few (^100) background counts, we use a 
continuous piecewise-linear function with up to 10 vertices, which we call the cplinear modelFjan example is shown 



in Figure 11 Figure 12 gives an example where the background model of Wachter et al. (19791 gives a poor fit and 



The 



Sherpa package 



(http : //cxc .harvard. edu/sherpa/ I provides similar fitting capabiUties, but tiie authors had more experience 



with XSPEC at the time A E development began. 



^ See Appendix B of the 



62 



See 



XSPEC manual 



I http: //heasarc.gsfc. nasa.gov/docs/xanadu/xspec /ma nual/manual .html i 



http: //xspector .blogspot . com/2007/Ol/cstat- with-background-files-and-low.html 



|2005/07/cstat . html 



and 



http : //xspector . blogspot . com/ 



°^ See the discussion of the cplinear model in the AE manual. A comparison of source models using the 
cplinear background models for 900 ACIS sources is available in Appendix B of that document. 



Wachter et al. 



(19791 and 



the cplinear model gives a good fit. 
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Fig. 11. — Example of a cplinear background spectral model (^7.5) for a point source coincident with soft diffuse 
emission. The model shown has 8 vertices (7 linear segments). The ordinate axis uses an arbitrary linear scale. 

The highly constrained cplinear model is inadequate to model high quality background spectra which exhibit 
complex structure. More physically-based background models would be useful to the ACIS community; our goal 
here is to emphasize that severely under-constrained background models should be avoided, rather than to claim 
that cplinear is optimal. The ultimate solution to modeling low-count X-ray spectra may lie in Bayesian approaches, 



such as the innovative technique of van Dyk et al. (2001). Those authors also suggest that background models with 



appropriate functional forms are useful, however the background model presented in that published work employed 
a free parameter for each channel in the spectrum ( van Dyk et al.pOOl eq. 22), similar to the Wachter model. 



Another algorithmic innovation to spectral fitting provided by AE is the improved method for grouping spectra 
(either for fitting with the statistic or for plotting ungrouped spectra fit with the C-statistic) described earlier 
(j |7.2| . Finally, AE also offers the observer several procedural conveniences. XSPEC scripts that drive fits to three 
common astrophysical models (absorbed one- and two-temperature thermal plasmas, absorbed power law), using 
either or the C-statistic with the cplinear background model, are provided. AE automates the execution of these 
fitting scripts; fitting multiple models to thousands of sources is feasible. When multiple models have been fit to a 
source, the observer can visualize those models and select the most appropriate one using the graphical interface in 
the AE tool Spectra Viewer; an example screen shot is shown in Figure |13[ Numerical output of the fitting process 
(in the form of a FITS table) includes best-fit spectral parameters with their confidence intervals and model fluxes 
integrated over various spectral bands. 



7.6. Source Variability 

For visualization of source variability, light curves for all observations of the source are depicted on a single plot, 
which is shown in calibrated units (photon ks^^ cm^^) rather than observed units (count ks^^) to avoid spurious 
apparent variability arising from differences in observatory response or PSF fraction between observations. Two light 
curves are constructed. One is a histogram with variable-width time bins, constructed by the same algorithm used 



to group spectra ([7.2). The second is adaptively smoothed using a box- kernel whose size is adjusted to encompass 



a specified number of events. The adaptive smoothing process also produces a time series reporting the median 
energy of the observed events falling within each kernel, providing a visualization of spectral variation over time. For 
each observation of a source, AE also produces a "photon arrival diagram" — a scatter plot of event arrival time and 
energy. Figure 14 shows an example multi-observation light curve and a photon arrival diagram for a flaring source. 



For quantification of variability, the null hypothesis of a uniform source fiux is tested using a one-sample 
Kolmogorov-Smirnov statisticr*] For multi-observation cases, variability is tested both within and between ob- 



The Chandra Source Catalog also provides 



variability tests 



. 1 

(http://asc.harvard.edu/csc/coluinns/varlability.html) based on 




Fig. 12. — Example of spectral models for a faint stellar member of the M 17 star forming region derived with the 
C-statistic in XSPEC using the Wachter et al. ( 1979 ) (top panel) and cplinear (bottom panel) background models 
(see Figure 10). Residuals are shown at the bottom of each panel. The top panel shows the observed cumulative 
net spectrum (black stair-step curve) and the best-fit thermal plasma model (blue continuous curve) for the star; 
the Wachter background model is not shown. Note that the fit is incorrect above 1 keV. The bottom panel shows 
cumulative spectra from the source aperture and background region separately. XSPEC is configured so that the 
spectrum extracted from the background region (red stair-step curve) is compared to a cplinear model (red continuous 
curve), and the spectrum extracted from the source aperture (black stair-step curve) is compared to the sum (black 
continuous curve) of a scaled copy of the cplinear model (purple curve) and a thermal plasma model of the star (blue 
curve). Best-fit parameters for the two models are then derived simultaneously. 



servations (accounting for variations in effective area among the observations). 

The current timing analysis has some deficiencies. No tests for autocorrelation, periodicity, or other temporal 
behaviors are made. The KS statistic is insensitive to variations near the beginning or end of the observation. 
Variability in the background is not considered in the uniform flux model, and background is not accounted for 
in the light curves or median energy time series. Although the ACIS background is often quite stable within an 
observation, multiple observations of a source can easily suffer very different background count rates within the 
source apertures because the aperture sizes can be quite different. Thus, for example, a spurious indication of 
variability can be produced for a weak source observed at two very different off-axis angles. We plan to eliminate 
this problem by adding to the uniform light curve model independent background levels for each observation. 



the variance, Kuiper statistic, and |Bayesian method] ( |http : //asc .harvard. edu/csc/«hy/gregory_loredo .htmll l of [Gregory fc Loredo| 
iT996l. 
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Fig. 13. — Screen shot of three spectral fits to an AE source presented in the Spectra Viewer tool. An IDL "widget" 
(top) provides navigation controls to select a source from the catalog, displays tabular information about all the 
spectral fits available for the selected source, and allows the observer to choose the preferred fit. Standard XSPEC 
plots from each spectral fit are shown in separate ghostview windows. A FITS header containing various properties 
of the selected source is shown (lower left window). 



7.7. Averaged or "Stacked" Properties of Source Samples 

It is often desirable to "stack" a sample of similar sources to construct a summed spectrum with higher SNR than 
available from individual objects. This can be performed either for sources detected in the ACIS data or at locations 
of objects known only from independent observations. Stacking many similar objects can thus give extremely long 
effective exposures and sensitivities, however stacking results can be dominated by the few brightest objects in the 



group. This technique has been widely used in Chandra deep extragalactic field studies (e.g., Hornschemeier et al. 
2001[ [Brandt et ar]|2001[ [Lehmer et al]|2007[ ). 



In AE, stacking is achieved by relabeling extraction subdirectories of interesting sources to resemble extractions 
of distinct observations of a single source. Then the extractions are merged (^|6| and source properties are estimated 
normally (Q. 
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Fig. 14. — Concatenated light curves for a source observed six times (six numbered epochs in upper panel) and 
photon arrival diagrams for two ol those observations (lower panels). The light curves show variations in photon flux 
(black histogram bins and blue adaptively smoothed curves) and median energy (red histogram bins); the spectrum 
is seen to harden when the source flares. The photon arrival diagrams depict the arrival time and energy of each 
event (black dots) in a single observation, the cumulative distribution of event arrival times (red curve), and the 
cumulative distribution expected from a constant source (magenta curve). 



7.8. Collation of Multi-source Tables 



The properties of individual sources obtained by AE can be collated into a summary FITS table with > 100 
columns and a row for each source. The scientist can then study relationships among quantities using, for example, 
the fv or chips FITS viewers, IDL two- and three-dimensional plotting, and multivariate statistical analysis. AE 



R is a large, 



public-domain 



(http://www.r-project.orgl statistical analysis software package. 
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also prepares standard quantities for publication as LaTeX tables, illustrated by Tables [l] and [2j This feature is 
particularly valuable for fields with hundreds or thousands of sources. 



8. Matching Point Source Catalogs 

The task of "matching" point source catalogs — identifying entries in two or more catalogs that are believed 
to represent the same physical source — is fundamental to astronomical research, and is growing increasingly more 
common as publicly available and high quality catalogs covering many wavebands proliferate. As yet, no standard 
algorithm for this task has been adopted by the community, and thus no single software tool for matching is in 
widespread use. 



We currently use a matching tool written by one of the authors, the IDL program matdijxy in TARA { Broos et al 



2007 ) . The basic criterion used to evaluate whether a proposed pair of sources from two catalogs could be detections 
of the same physical object is very simple. Given a significance threshold chosen by the observer, Gaussian positional 
uncertainties specified individually for the two sources are used in a classical test of the null hypothesis that they are 
observations of the same object. This matching criterion does not represent an innovation; the astronomy literature 
contains several more sophisticated catalog matching algorithms, such as ones that provide individual probabilities 



that each asserted match is merely a chance coincidence ( Downes et al.|1986 Sutherland fc Saunders|1992{ Rutledge 



et al.''2000: Bauer et al."2000'), ones that consider non-positional source properties (such as fluxes) ([Brand et al 



2006; Budava ri fc Szalay||2008( ), and ones that operate on more than two catalogs simultaneously (e.g., the \XMatch 
algorithnj^ in the SkyQuery service designed for the Virtual Observatory). 

However, we do recommend four features of our matching tool for use in any catalog matching procedure applied 
to Chandra sources. First, individual estimates of positional uncertainty for each source, rather than a canonical 
value for the whole catalog, are employed because they often span a wide range due to Chandra^s variable PSF; 



the uncertainties estimated by AE (^7.1) for a recent project range from < 0.1" (not including systematic errors) to 
> 2". Second, our algorithm enforces a one-to-one relationship among the pairs of sources declared to be "successful" 
matches, i.e., no source participates in multiple successful matches. The algorithm reports a "failed" match when 
a pair of sources pass the matching criterion, but one of them is already participating in another (more reliable) 
match. Third, we have found that visual depictions of the matching results, such as the DS9 region files produced 
by match_xy, are invaluable aids to reviewing the general performance of the algorithm and to understanding specific 
sources. We have used these region files in a prototype Source Viewer tool that facilitates visual examination of the 
vicinity of each source in two or more images (e.g.. X-ray and near- infrared), and records the observer's subjective 
comments on individual sources. Fourth, all matching procedures should measure and correct astrometric offsets 
between catalogs at the outset; any large astrometric uncertainty should be included in the individual estimates of 
positional uncertainty. 



9. ANALYSIS OF DIFFUSE X-RAY STRUCTURES 

Extended emission from hot diffuse plasma is commonly present in X-ray studies. This occurs in galaxy groups 
and clusters, elliptical and spiral galaxies. Galactic star formation and starburst regions, supernova remnants, plan- 
etary nebulae, and wind-blown bubbles. Since point sources are often superposed onto the diffuse emission, the 
first step in our diffuse analysis workflow (Figure [T]) is to identify and mask (remove) as many point sources as 
possible, using the procedures described in [j4|- |5.4| Observers must be aware that fainter undetected point sources 
may be present in some fields; interpretation of diffuse emission should consider this corruption by undetected point 
sources. Diffuse analysis proceeds with smoothing the resulting source-free image, defining regions that contain 
diffuse structures of interest, and extracting the X-ray data from those regions. 



: //openskyquery .net/Sky/SkySite/help/ algo . aspx 
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9.1. Smoothing of Source-Free Images 



As described in |5.4| and Figure [2] AE provides two methods for constructing mask regions that remove point 
source photons that would contaminate an analysis of diffuse emission. The remaining event distribution, consisting 
of instrumental background and possible astrophysical diffuse emission, is often best studied in smoothed images. 
Although the CI AO task csmooth ( Ebeling et al.|[2006 ) is effective and popular for smoothing unmasked data that 



contain both point sources and diffuse emission, it is poorly suited to working with images that contain unobserved 
regions, either masked point sources or regions beyond the edges of the detector, because the tool does not allow a 
field mask to be supplied. The CL40 [thread for diffuse emissio rp^with embedded point sources replaces pixel values 
in source regions of an image with values interpolated from surrounding background regions using random Poisson 
numbers or local polynomial regression. 

We prefer to apply an adaptive kernel smoothing tool in ' TAR A that handles excised pixels and field edges, 
which is similar to the asmooth tool in the XMM- 5*^ package and the new dmimgadap^^iool in CIAO 4.2. Three 
smoothing kernels are available: a top-hat (or Hcavisidc function), a symmetric two-dimensional Gaussian, and a 



two-dimensional Epanechnikov kernel ( Silverman||l992j ). The observer supplies a threshold SNR value; at each pixel 



position, the algorithm finds the smallest kernel that achieves this SNR target and estimates an incident photon flux 
(photon s^^) by dividing the number of counts under the kernel by the integrated exposure map under the kernel. 
Maps of photon flux, flux error, and kernel size are produced. The observer can choose to retain or smooth over the 
holes in the data introduced when point sources are masked. The method is described in detail, with sample images. 



in Townsley et al. (2003 Appendix C) 



9.2. Extracting Diffuse Sources 

As the AE package does not attempt to delineate the morphology of diffuse structures, the observer must define 
each diffuse source of interest in the form of a DS9 region file that is accepted by CIAO. We have found the WVT] 
Binning algorithrrp°] ( jPiehl fc Statler 2006 1 , which is a generalization of the Voronoi binning algorithm described 
by Cappellari fc Copin| (2003), to be a very effective method for tessellating a field with compact regions of similar 
Figure 1 16| shows an example if its use. 

The procedure for extracting each diffuse source with AE is analogous to the point source procedure. For each 
observation, events within the source region are extracted, calibration products are constructed, and background 
spectra are extracted. All the extractions are merged, photometric quantities are estimated, and spectra are fit via 
automated scripts. However, more complex procedures are required to calibrate diffuse extractions and to account 
for background, as described in the next two sections. 



9.3. Calibration of Diffuse Extractions 



The 'ARF data producl|^^| constructed for point sources by the tool mkarf can be thought of more abstractly as 
an observatory response function, ARF(_B,p), computed at a position on the sky, p, for a monochromatic energy E, 
with units of cm^ count photon" For Chandra data, this function represents both variations in the observatory 
response across the focal plane and the exposure time variations across the sky caused by dithering over bad detector 



http: //asc .harvard. edu/ciao/threads/dlffuse_emisslon/ 



http://xmm.vllspa.esa.es/external/x imn_sw_c al/sas_f raine . shtml 



^' http : //cxc ■ harvEird ■ edu/ciao/ ahelp/dm imgadapt . html 



http: //www. phy. ohiou. edu/-diehl/WVT/ 



We can provide the reader with a small tool that conv erts the output of the WVT Binning package into a set of DS9 region files. 



http: //cxc. harvEird. edu/ciao/dictionary/arf .html 



An ARF is a product of a mirror effective area with units of cm^ and a detector quantum efficiency with units of count photon ^ . 
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pixels and detector edgesj^ Essentially, ARF(£', p) describes the "depth" of the observation, since effective area and 
observing time are degenerate terms in a flux calculation. For example, ARF(i?,p) trends downward with off-axis 
angle, due to vignetting of the mirrors. ARF(i?, p) is reduced at any sky position that dithered across bad pixels or 
dithered off of ACIS. Conceptually, ARF(i?, p) is zero in regions on the sky that have been masked to remove point 
sources (j |5.4[ ). 

The ARF(i?,p) function for any particular observation will vary within a diffuse extraction region, and in a 
multi-observation study may be zero over much of that region. Figure [15] shows an example diffuse region that is 
fully contained in one ACIS-I observation (left panel) but barely intersects another ACIS-I observation (right panel) . 
Just as a diffuse extraction can be viewed as an integration of the observed counts over the extraction region, the 
proper calibration of that extraction can be viewed as an integration of ARF(ii^,p) over the region. 

ARFr{E)^ I ARF{E,p)dp (10) 
Jr 




Fig. 15. — A diffuse extraction region (black polygon) that is fully observed by one ACIS-I observation (masked 
exposure map, left panel) and partially observed by another observation (masked exposure map, right panel). 

Note that this integration over a region on the sky gives ARFfl(i?) units that include a geometric area term, 
such as cm^ count photon^ ^ arcsec^. As ARF ji{E) is carried forward into photometry calculations and spectral 
fitting, all "flux" quantities derived should be interpreted as surface flux quantities, with arcsec"^ appended to the 
units. Flux integrated over the diffuse region is then estimated by multiplying the surface flux by the region's total 
geometric area on the sky; the area lost to point source masks, bad pixels, and detector edges is already accounted 
for in ARFfl(S). 

Recasting the diffuse ARF of each extraction to include the notion of "area on the sky" is particularly convenient 
when multiple extractions are to be combined (i.e., merged by AE, just as is done with point sources). Just as each 
extraction of a point source may have a different PSF fraction (j |5.3[ ), each extraction of a diffuse source may have 
a different "area on the sky" (due to masking and detector boundaries). In both situations, corrections to the 
calibration are most intuitively applied to each extraction prior to merging. 

ARFfl'(i?) cannot be directly calculated with existing tools. The CI AO point source tool mkarf implements 
most of the behavior of the integrand ARF(i?, p) described above, but has no mechanism to set ARFfl(i?, p) to zero 
at positions where the event data have been masked. The CI AO tool mkwarf ("make weighted ARF") implements 
an effective area averaging calculation that is related to Equation |10[ however for each observation mkwarf averages 
over only the detector area that intersects the diffuse region, not over the entire diffuse region, which may include 
areas with zero response due to masking or detector boundaries. The difference between the diffuse ARF we have 



described here, ARFii{E), and the data product returned by mkwarf can be summarized with the aid of Figure 15 



For both extractions shown there, the mkwarf data products will have similar normalizations, because responses are 
averaged over the detector areas intersecting the extraction region. In contrast, ARF/j(i?) will be much larger for 
the left-hand extraction than for the right-hand extraction, because responses are averaged over the entire region. 



The Chandra convention is that the exposure time (FITS keyword "EXPOSURE") recorded for a point source is always the nominal 
exposure time for the observation. If necessary, a source's ARF is reduced to account for the amount of time the source was not observed 
due to dither motion. 
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Although the full ARF ji{E) function is not available as a standard data product, the value of ARFfl(i?) at a 
single energy Eq is easily calculated by integrating over the extraction region a standard CI AO exposure map built 
for a monochromatic energy Eq and masked in the same way as the event data. In the context of an AE extraction, 
the exposure map is "un- normalized" with units of s cm^ count photon"^, built by supplying the "normalize=no" 
option to the CIAO tool mkexpmap. This exposure map is related to the integrand ARF(i?,p) as 

emap£;^(p) = EXPOSURE x ARF(£;o, P), (11) 

where EXPOSURE is the "exposure time" of the observation. Thus, the diffuse ARE we seek is easily calculated at 
one energy as 

- EXPOSURE ^^2) 

With the normalization of ARFij;(i?) established at energy Eq, ^i?then relies on the mfcwar/ result to establish 
the shape of AKFji{E) as a function of energy. AE^s complete calculation of ARF^(i?) is thus 

» -pjy-i / mkwa7' f 

(E) /j^ emap^Jp) dp 
~ ARF,„,^,./(£;o) EXPOSURE 

where AKFmkwarf is constructed by mkwarf. 

To summarize this section, we calibrate each diffuse extraction by constructing ARF/j'(i?), which has units 
of cm^ count photon^^ arcsec^ and accounts for the area on the sky lost to point source masks, bad pixels, and 
detector edges. "Flux" quantities subsequently derived should be interpreted as surface flux quantities, with arcsec"^ 
appended to the units. Flux integrated over the diffuse region is estimated by multiplying the surface flux by the 
region's total geometric area on the sky. 



9.4. Background in Diffuse Extractions 



Diffuse sources are often heavily contaminated by background — both instrumental background and emission 
from foreground and background astrophysical sources that are not of interest. Several strategies to account for 
background in diffuse sources are described in the AE manual. We choose to subtract from each diffuse extraction a 
standard instrumental background spectrum obtained by scaling the so-called "ACTS stowed event data'Vj provided 



in the Chandra Calibration Database (Hickox & Markevitch 2006). After merging the multiple observations of a 



diffuse region (with the same methods that are used for point sources in ^ with no optimization options enabled), 
standard "source" and "background" spectra with calibration data products are available. Astrophysical background 
contaminating a diffuse region can be handled in either of two ways. First, if a nearby "sky" region thought to be 
mostly free from the emission under study is available, then its observations and "stowed backgrounds" are extracted, 
and two net spectra from the diffuse region and the sky region are simultaneously fit using a shared model for the 
astrophysical background in both regions plus a source model for the emission of interest in the diffuse region. 
Alternatively, if no suitable "sky" region is available, then the astrophysical background must be directly modeled 
(Snowden et al. 2008). If appropriate fitting scripts are constructed, AE can automate this spectral fitting in the 



same way that point source spectral fitting is automated. 



9.5. Visualizing Model Parameters for Spectra of Diffuse Structures 

When more than a few diffuse regions are defined, many observers have found that spectral fitting results can 
be best understood by creating maps that show various results, e.g., model parameters and fluxes. Standard gray- 
scale or false-color maps effectively depict to the human eye spatial variations in a parameter. We are currently 
experimenting with a complementary map-coloring technique that seeks to depict both a parameter's value and its 



uncertainty, using hue to encode the parameter value and brightness to encode its uncertainty, as shown in Figure 16 



http : //cxc . harvEird . edu/ciao/threads/acisbackground/ index . py . html 
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Fig. 16. — AE analysis of the Chandra Carina Complex Project ( Townsley et al.|[2010 ) study of the Carina Nebula 
(see Figure [2|, a region with strong diffuse emission with complex morphology. Diffuse regions (yellow) are defined 
by tesselating an image of diffuse emission using the WVT Binning algorithm ( Diehl fc Statler|2006[ ) . In this example 
map the color inside each region represents the absorption parameter (Nh) from the spectral model derived from 
the corresponding extracted spectrum. As shown in the legend, the hue (red ... blue) of the color encodes the Nh 
value (relative to the median): red hues represent values 50% below the median; green hues represent the median 
value, 0.29; blue hues represent values 50% above the median. The brightness of the color encodes the uncertainty 
of that value. For example, a highly certain low Nh value would be bright red; a highly uncertain low value would 
be maroon. Regions where the parameter was frozen in the fit are marked with white diagonal lines. The inset plot 
shows the distribution of mapped Nh values ( in units of 10^^ cm~^) with the values corresponding to the red, green, 
and blue hues marked by vertical lines. 

10. VISUALIZATION 

Visualization of data products throughout the analysis workflow shown in Figure [l] can reveal various problems 
that commonly arise, including artifacts in the data, mistakes in execution of the workflow such as skipping a step 
or failing to recognize a failure, bugs in software tools or unexpected changes in the algorithms implemented by 
tools, astrometric offsets in observations, or uncommon features in the data such as CCD readout streaks, severely 
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piled-up sources, and sources lying on the edge of or just outside the field of view. We display the events removed 
from the observation at each cleaning step of the Ll-to-L2 processing (fpl to look for unexpected patterns. We plot 



candidate point sources arising from the detection process (^4.21 on top of the event data to look for duplicates and 



spurious detections. As the candidates are extracted by AE, each observation's source apertures are examined in 



DS9 to verify that they are not overlapping ( g5.l[ ). If AE masks the data to remove point sources (^5.4), then the 
resulting background data set is scanned to determine if additional masking is required (e.g., around very bright 
sources). Candidate sources that are found to be not significant (likely background fluctuations) are reviewed before 
they are pruned from the catalog ( |4.ip . All AE source position estimates ( |7.1[ ) that differ significantly from the 
original position assigned by the detection process are visually confirmed before the revised position is adopted. 

Many visual reviews consist of displaying a project-level event list (constructed by merging all the observations) 
in DS9, overlaid with color-coded DS9 regions depicting sources of interest. AE provides a complementary tool (the 
"SHOW" option) that examines a single source in detail, as shown in Figure [5] The neighborhood around the source 
in each observation is shown in a separate DS9 frame, overlaid with the extraction aperture; another frame shows 
the merged neighborhood. If a reconstructed image of the merged neighborhood is available, then it is shown in an 
additional frame. Basic information about each extraction is presented in tabular form. 

After the catalog is extracted and source properties are collated, we plot the distributions of key source properties, 
looking for outliers that signify mistakes (or discoveries). Light curves are examined for sources exhibiting strong 



variability (j]7.6[). Results from spectral fitting are reviewed using a custom tool designed for that purpose (^7.5 1 



11. SUMMARY 

Many Chandra- ACIS imaging studies face significant data analysis challenges arising from large numbers of weak 
and sometimes crowded point sources embedded in scientifically relevant diffuse emission, observed with multiple 
misaligned pointings. We have discussed here a variety of innovations to standard ACIS analysis methods that 
address these challenges; the most important of these are summarized below. 

1. Currently, a single set of data cleaning procedures is not adequate if the observer plans to study both very weak 
(<10 counts) point sources (or diffuse emission) and bright point sources, because several cleaning algorithms 
remove legitimate X-ray events from bright sources. Thus, we find that distinct data cleaning procedures are 
required for different types of analyses (fjsj). 

2. When point sources are to be extracted, we recommend evaluating the existence of sources using those extraction 
results rather than relying on typical source detection tools for that judgment (j|4.1|. 



3. In crowded fields we recommend searching for candidate sources in reconstructed images (j|4.2[), defining extrac- 



tion apertures that do not overlap (^5.1 1, and defining background regions that seek to model the background 



contributed by the wings of nearby sources ( ^5.4[ ). 
4. We recommend correcting all point source extractions for the energy-dependent fraction of light that lies outside 



the source aperture (^5.3| 



5. When a source is observed multiple times, we recommend estimating source validity, position, and photometry 
using three independent combinations of the extractions, each allowed to discard observations to optimize the 



accuracy of the corresponding quantity (^6.2) 



6. We offer an algorithm for grouping spectra (^7.2| that lessens biases inherent in standard algorithms. 



7. We raise concerns about employing under-constrained background models in the context of spectral fitting and 



offer an alternative ([7.51 



8. When diffuse X-rays are extracted from a region that contains unobserved areas (e.g., due to detector edges or 
point source masks) we describe a necessary correction to the calibration provided by the tool mkwarf {{9.3). 
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Most of the data analysis methods we have discussed are implemented in the ACIS Extract (AE) software 



package,! which has been freely available to the community since its development began in 2002 R AE is written m 



the IDL language, and makes extensive use of tools in CI AO and in several other public packages. 

Although much of the analysis we perform on our ACIS observations has been automated, we believe that 
the obserer should retain many vital roles in the process. The human eye is often able to spot omissions and 
spurious entries in the set of candidate sources derived from detection procedures (i ]4.2[ ). We rely on the observer to 
judge what effect CCD readout streaks may have on the data analysis and to take appropriate mitigating actions 



(j|5.4[). Scientific judgement is required to select appropriate levels of smoothing for diffuse sources (^9.1) and to 



define appropriate regions to extract ( g9.2[ ). We encourage the observer to visually review {\ 10) data cleaning steps, 
extraction apertures, catalog pruning and source repositioning proposed by algorithms, spectral fitting results, and 
multi-wavelength associations asserted by our matching algorithm. 
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appreciate the time and useful suggestions contributed by our anonymous referee. We would have been lost without 
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A. RUNNING TWO AFTERGLOW ALGORITHMS 

We concur with thelCXC's recommendatiorjf^to use both afterglow tools {acis_detect_afterglow and acis-run_hotpix) 
for identifying afterglow events ([js]) when seeking to detect weak sources. Since both algorithms use bit 16 in the 
^event STATUS worcf^to represent afterglows, the operational details of applying both algorithms can be confusing. 
We show below one method for doing this. First, the afterglow flag returned by the gentle algorithm in the tool 
acis-runJiotpix is moved from bit 16 to the unused bit 31 {dmtcalc call below). Second, the aggressive algorithm in 
the tool acis-detect-afterglow is applied, storing its results in bits 16-19. (Note that in the dmtcalc syntax below bits 
are numbered to 31, counting from right to left.) 

dmtcalc inf ile=acis . evtl outf ile=temp . evt \ 

expression="status=status , status=XOF, if (status==X15T)then(status=X0T) " 

acis_detect_af terglow inf ile=temp . evt outf ile=acis . evtl \ 

pha_rules=NONE f ltgrade_rules=NDNE 



Gentle cleaning that does not damage bright point sources (|3|) can be performed on the resulting STATUS 
word by ignoring the bits assigned by the destreak tool (bit 15), the acis-detect-afterglow tool (bits 16-19), and the 
Very Faint Mode grading algorithm (bit 23): 

dmcopy "acis . evtl [status=OOOOOOOOxOOOxxxxxOOOOOOOOOOOOOOO] " gently_cleaned . evt 



Development of AE is on-going; observers can be notified of new releases by joining the 



cgi 


-bin/wa?AO=L- ASTRO- ACIS-EXTRACT 1 . 




77 


http : / /cxc . harvard . edu/ciao/threads/acishotpixels/index . html#caveat s 


7f 


http : //space . mit . edu/CXC/docs/docs . html#evtbits 





AE mailing list 



I http: //lists .psu. edu/ 
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Aggressive cleaning that is suitable for source detection and for analysis of diffuse emission can be performed 
by requiring all STATUS bits to be zero: 

dmcopy "acis . evtl [status=0] " aggressively_cleaned.evt 



B. Source Significance 



AE provides two measures that quantify the validity of a source. First, a traditional "signal-to-noise ratio" is 
defined to be the ratio of the net source counts observed to the estimated uncertainty on that quantity. 

Second, the null hypothesis that a source does not exist in the source aperture is tested by the method described 



by Weisskopf et al. ( 2007 Appendix A2) , which is derived under the more physical assumption that X-ray extractions 
follow Poisson distributions. Assuming the null hypothesis — all the events in the source aperture are background — 
they show that the joint probability of finding at least the observed number of events in the source aperture and 
finding the observed number of events in the background region can be found by integrating a binomial distribution. 
This calculation can be performed with the following call to the binomial function in IDL: 

Pb = BinoTmal{C'-C' + C\ j.^^J , (Bl) 

where and are the number of counts observed in the source aperture and background region in a specified 
energy band. The source aperture and background region "areas," and A^, are discussed in ^5.4 



This derivation takes into account both the uncertainty of estimating the background level (i.e., the Poisson 
nature of C^) and the uncertainty of the events observed within the source aperture (i.e., the Poisson nature of 
C^). Thus, the significance of a given source extraction will tend to decrease {Pb will rise) if its background region 
is reduced in size because fewer background counts (C^) will be detected (regardless of whether the background 
surface brightness inferred from increases or decreases). This behavior can be intuitively rephrased as "weak 
sources benefit more than strong sources from accurate estimates of the background." AE sizes background regions 
so that Poisson uncertainty on the background contributes no more than 3% of the total uncertainty on the source 
photometry. When is large and the background is thereby accurately estimated, Pb approaches the familiar 
integral of the Poisson distribution over the interval [C*,oo] ( Weisskopf et al.||2007 Appendix A2), 



Pb^I- J2 Poisson{i- {A''/A^)C^). (B2) 



i=0 



We recommend using the quantity Pb as the principal measure of the validity of a source's existence, in the 
context of the iterative source detection strategy described in f|4j Pb can be read as "the probability that all counts 
in the source aperture are background" or "the probability that no source exists at the extracted location in the 
presence of the observed local background." 



C. MODELING THE POINT SPREAD FUNCTION 



The Chandra PSF varies both with position on the detector and with energy. Several models of the Chandra 
High Resolution Mirror Assembly (HRMA) are available, including ChaRT, SAOTrace, MARX, and the mkpsf tool in 
CI AO. ChaRT ha,s an interactive interface and is thus not suited for automated processing; SAOTrace is available on 
only a limited set of computer platforms. The mfcps/ output has technical limitations such as coarse spatial sampling 
across the detector, and omission of PSF blurring effects not related to the HRMA. We therefore use the MARX 
ray-trace simulator, running simulations for each observation of each source at five monochromatic energies: 0.277, 
1.4967, 4.51, 6.4, and 8.6 keY^MARX dithers the simulated source using the observation's aspect file, allowing 
accurate modeling of distortions caused by the PSF dithering over the boundaries of the ACIS CCDs. 



These are the five PSF energies used by the 



Chandra PSF Library 



I http : 1 1 cue . harvard . edu/ciao/dictionary/psf lib . html I 
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On-axis, the PSF of a real source observed by ACIS differs strongly from the HRMA PSF due to three blurring 
effects. First, significant quantization effects arise because ACIS pixels do not fully sample the HRMA PSF. Second, 
the reported positions of ACIS events are reconstructed from an imperfect aspect solution that measures the dither 
motion. Third, the default CIAO pipeline adds a ±0.25" random number to each event's positionj^ The standard 
model for these blurring effects is a Gaussian blurring function built into AL4i?Xp] Calibration of this blurring model 
(the standard deviation of the Gaussian) is available only for all three blurring effects combined; this is not suitable 
for our needs because we choose to remove the event position randomization during our event pre-processing (f|3]). 

Thus, we choose to run MARX simulations with its post-HRMA blurring model disabled, and then blur the 
simulated event positions ourselves in two steps. First, ACIS quantization is modeled by convolving the HRMA PSF 
image with a "box kernel" sized to match the ACIS pixel. We feel a Gaussian kernel, which has infinite extent, is 
an inappropriate model for pixel quantization (and for the event position randomization, when present). Second, 
aspect reconstruction errorj^ are modeled by convolving the PSF image with a two-dimensional Gaussian kernel 
with a,j. = ay = 0.07". 
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